This report gives a brief summary of the textual analysis of the submissions to the Discourse discussion on Invasive Species by the Environmental Audit Select Committee, and the tweets using the #TransportInvasiveSpecies hashtag on Twitter.
Most comments were posted between the 10th - 17th May, with spikes in activity on the 15th and 10th May receiving 46 and 14 comments per day respectively. The timings of comments were spread throughout the day, with 3pm being the most popular time of day for users to be online. They were also very active at 6-7am, 9am, and 11am, and this pattern of peaks and troughs suggests users were revisiting the platform regularly throughout the day to respond to comments, especially in the morning.
The Discourse comments had an average of 87 words in each compared to 24 on Twitter, which is normal pattern to observe. There was an average Flesch readability score of 54 suggesting readers needed to be educated to at least a UK Grade Level of 12 to understand the comments.
The most common adjectives, phrases and pairs of words are displayed below. People tend to express their emotions through the adjectives they use, and in this case “local”, “responsible”, and “dangerous” being used so often relate to the main concern of the discussion surrounding the safety aspects of pavement parking. The phrases “yellow line”, “local authority”, and “mobility scooter” also show a range of topics within the area were also being discussed.
A network of the most frequent consecutive word pairs (bigrams) is shown below.
“grey squirrels”, “red squirrels”, and “invasive species” are the most common word pairs in the Discourse dataset. A cluster comprising of phrases such as “alien/invasive/native species” show a difference of terminology used in the discussion, while another cluster of phrases surrounding squirrels refers to the primary debate. “Japanese knotweed”, “asian hornet” and “climate change” are also common bigrams and suggest alternative discussions that were ongoing independent of the squirrel debate.
The Twitter discussion showed many retweets between different users including “wcl_news”, “ukladybirds”, and “tthecccuk”. There were also repeated mentions of “prof helen roy”, and discussions about the evidence sessions held by the committee. Different species were raised on twitter such as plants, animals, and fungus, and concerns such as “growing threat”, “uk economy 2bn”, and “harm pathogens” shows the range of topics which were being discussed using the #TransportInvasiveSpecies hashtag during the week.
Within the Discourse platform, 3 topics were created by the Transport team considering different areas of pavement parking. A plot of 10 words most associated with one of 3 topics are shown below. Each coloured bar chart represents a single topic.
A brief summary of those topics are:
| Topic Number | Common words |
|---|---|
| Topic 1 | hornets, honey bees, encroachment, relocated |
| Topic 2 | license, pregnant, welsh, traps |
| Topic 3 | disgusting, book, financial, UK, balance |
In this case, topics 4 and 2 were mainly about the red and grey squirrel debate while topics 1 and 6 centred around other species such as hornets, bees, and ducks. The model also extracted some comments which were about some participants’ anger at their comments being flagged by other users of the platform for inappropriateness.
Following the link below will provide an alternative topic model visualisation which is split into two sections:
Left - showing topic distances from each other based on the types of words in each,
Right – showing the top 30 words pairs in each topic (red bar) and overall in the dataset (blue bar). I recommend setting the relevance metric to 0.6 to get a more representative list of words in each topic.
This visualisation is interactive, hover over each topic number to view the words in each topic, or select each word to view which topics it is relevant to.
https://nicolednisbett.github.io/Transport/#topic=0&lambda=0.60&term=
The wordcloud below gives the most popular words associated with positive and negative sentiments in the survey. Specific comments which are associated with the most popular sentiments are listed below.
The NRC sentiment lexicon uses categorical scale to measure 2 sentiments (positive and negative), and 8 emotions (anger, anticipation, disgust, trust, joy, sadness, fear, and suprise). Examples of words and comments in these sentiment categories are below.
In the Discourse debate, the majority of submissions were equally negative and positive but also categorised as trust, sadness, and fear.
On the other hand, the Twitter discussion was much more positive overall with anticipation and fear being other prominent sentiments expressed. However, there were much fewer tweets than Discourse comments so this could impact the accuracy of the results.
Hover over the plot below to read the content of the comments within Transporth sentiment category.
## [1] 7
##
## anger anticipation disgust fear joy
## 0.07615894 0.09933775 0.06291391 0.12251656 0.07947020
## negative positive sadness surprise trust
## 0.12582781 0.12913907 0.10596026 0.06953642 0.12913907
## [1] 8
##
## anger anticipation disgust fear joy
## 0.11182109 0.12140575 0.01597444 0.12460064 0.11182109
## negative positive sadness surprise trust
## 0.15654952 0.15974441 0.04153355 0.01277955 0.14376997